Overview

Dataset statistics

Number of variables9
Number of observations813
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory57.3 KiB
Average record size in memory72.2 B

Variable types

Numeric8
Categorical1

Alerts

df_index is highly correlated with N and 6 other fieldsHigh correlation
N is highly correlated with df_index and 5 other fieldsHigh correlation
P is highly correlated with df_index and 4 other fieldsHigh correlation
K is highly correlated with df_index and 7 other fieldsHigh correlation
humidity is highly correlated with df_index and 7 other fieldsHigh correlation
rainfall is highly correlated with df_index and 6 other fieldsHigh correlation
label is highly correlated with df_index and 7 other fieldsHigh correlation
temperature is highly correlated with df_index and 4 other fieldsHigh correlation
ph is highly correlated with K and 3 other fieldsHigh correlation
df_index has unique values Unique

Reproduction

Analysis started2023-03-14 18:37:56.906504
Analysis finished2023-03-14 18:38:10.638439
Duration13.73 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct813
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean628.9766298
Minimum0
Maximum1640
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:10.740051image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.6
Q1203
median406
Q31109
95-th percentile1577.4
Maximum1640
Range1640
Interquartile range (IQR)906

Descriptive statistics

Standard deviation519.214069
Coefficient of variation (CV)0.8254902399
Kurtosis-1.035903468
Mean628.9766298
Median Absolute Deviation (MAD)313
Skewness0.6289047422
Sum511358
Variance269583.2495
MonotonicityStrictly increasing
2023-03-14T14:38:10.944252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
9461
 
0.1%
9361
 
0.1%
9371
 
0.1%
9381
 
0.1%
9391
 
0.1%
9401
 
0.1%
9411
 
0.1%
9421
 
0.1%
9431
 
0.1%
Other values (803)803
98.8%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
ValueCountFrequency (%)
16401
0.1%
16391
0.1%
16381
0.1%
16371
0.1%
16361
0.1%
16351
0.1%
16341
0.1%
16331
0.1%
16321
0.1%
16311
0.1%

N
Real number (ℝ≥0)

HIGH CORRELATION

Distinct100
Distinct (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.52521525
Minimum0
Maximum100
Zeros8
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:11.136567image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q122
median35
Q367
95-th percentile93
Maximum100
Range100
Interquartile range (IQR)45

Descriptive statistics

Standard deviation28.35881834
Coefficient of variation (CV)0.6668706595
Kurtosis-1.010028628
Mean42.52521525
Median Absolute Deviation (MAD)22
Skewness0.4595782639
Sum34573
Variance804.2225777
MonotonicityNot monotonic
2023-03-14T14:38:11.318514image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2726
 
3.2%
6019
 
2.3%
3219
 
2.3%
2218
 
2.2%
3518
 
2.2%
4018
 
2.2%
2818
 
2.2%
2517
 
2.1%
3717
 
2.1%
3116
 
2.0%
Other values (90)627
77.1%
ValueCountFrequency (%)
08
1.0%
18
1.0%
29
1.1%
310
1.2%
44
 
0.5%
512
1.5%
614
1.7%
710
1.2%
89
1.1%
913
1.6%
ValueCountFrequency (%)
1001
 
0.1%
9910
1.2%
986
0.7%
975
0.6%
962
 
0.2%
956
0.7%
947
0.9%
938
1.0%
925
0.6%
919
1.1%

P
Real number (ℝ≥0)

HIGH CORRELATION

Distinct72
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean77.63837638
Minimum35
Maximum145
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:11.477637image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum35
5-th percentile38
Q155
median66
Q380
95-th percentile141
Maximum145
Range110
Interquartile range (IQR)25

Descriptive statistics

Standard deviation33.82972581
Coefficient of variation (CV)0.4357345862
Kurtosis-0.6805364156
Mean77.63837638
Median Absolute Deviation (MAD)12
Skewness0.8817401804
Sum63120
Variance1144.450348
MonotonicityNot monotonic
2023-03-14T14:38:11.643401image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6033
 
4.1%
7226
 
3.2%
5826
 
3.2%
5524
 
3.0%
5622
 
2.7%
5921
 
2.6%
5720
 
2.5%
6718
 
2.2%
6218
 
2.2%
3517
 
2.1%
Other values (62)588
72.3%
ValueCountFrequency (%)
3517
2.1%
368
1.0%
377
0.9%
3811
1.4%
397
0.9%
406
 
0.7%
4110
1.2%
427
0.9%
4311
1.4%
4414
1.7%
ValueCountFrequency (%)
1458
1.0%
14412
1.5%
14310
1.2%
1427
0.9%
1417
0.9%
14012
1.5%
13912
1.5%
13810
1.2%
1377
0.9%
13610
1.2%

K
Real number (ℝ≥0)

HIGH CORRELATION

Distinct44
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean77.2902829
Minimum15
Maximum205
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:11.808002image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile16
Q121
median39
Q385
95-th percentile203
Maximum205
Range190
Interquartile range (IQR)64

Descriptive statistics

Standard deviation73.14222508
Coefficient of variation (CV)0.9463314447
Kurtosis-0.8614120092
Mean77.2902829
Median Absolute Deviation (MAD)22
Skewness0.9393231055
Sum62837
Variance5349.78509
MonotonicityNot monotonic
2023-03-14T14:38:11.970554image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
1741
 
5.0%
1938
 
4.7%
2237
 
4.6%
2335
 
4.3%
2034
 
4.2%
2132
 
3.9%
1830
 
3.7%
2427
 
3.3%
1626
 
3.2%
19724
 
3.0%
Other values (34)489
60.1%
ValueCountFrequency (%)
1522
2.7%
1626
3.2%
1741
5.0%
1830
3.7%
1938
4.7%
2034
4.2%
2132
3.9%
2237
4.6%
2335
4.3%
2427
3.3%
ValueCountFrequency (%)
20518
2.2%
20422
2.7%
20322
2.7%
20214
1.7%
20118
2.2%
20014
1.7%
19914
1.7%
19815
1.8%
19724
3.0%
19621
2.6%

temperature
Real number (ℝ≥0)

HIGH CORRELATION

Distinct788
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.58808318
Minimum8.825674745
Maximum41.94865736
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:12.159359image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum8.825674745
5-th percentile16.51783455
Q119.33162606
median22.05592283
Q324.71417533
95-th percentile33.37036145
Maximum41.94865736
Range33.12298261
Interquartile range (IQR)5.38254927

Descriptive statistics

Standard deviation5.047262483
Coefficient of variation (CV)0.2234480209
Kurtosis2.186847293
Mean22.58808318
Median Absolute Deviation (MAD)2.71827207
Skewness0.9893502923
Sum18364.11163
Variance25.47485857
MonotonicityNot monotonic
2023-03-14T14:38:12.335932image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19.522262412
 
0.2%
23.461683382
 
0.2%
20.769522092
 
0.2%
22.642368762
 
0.2%
15.538348012
 
0.2%
17.000676252
 
0.2%
19.918530922
 
0.2%
18.153001532
 
0.2%
20.160805242
 
0.2%
21.539891762
 
0.2%
Other values (778)793
97.5%
ValueCountFrequency (%)
8.8256747451
0.1%
9.4679604451
0.1%
9.5355855431
0.1%
9.7244576111
0.1%
9.8512426291
0.1%
9.9499290821
0.1%
10.380047591
0.1%
10.723024591
0.1%
10.898758731
0.1%
11.021053781
0.1%
ValueCountFrequency (%)
41.948657361
0.1%
41.656029961
0.1%
41.361063011
0.1%
41.207336241
0.1%
41.186649031
0.1%
40.660122941
0.1%
39.707721921
0.1%
39.648518811
0.1%
39.371025531
0.1%
39.065555181
0.1%

humidity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct769
Distinct (%)94.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56.96614304
Minimum14.25803981
Maximum94.92048112
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:12.498836image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum14.25803981
5-th percentile15.80224812
Q122.33195853
median65.34583901
Q382.45687182
95-th percentile92.72604913
Maximum94.92048112
Range80.66244131
Interquartile range (IQR)60.12491329

Descriptive statistics

Standard deviation28.78614121
Coefficient of variation (CV)0.5053201722
Kurtosis-1.563963712
Mean56.96614304
Median Absolute Deviation (MAD)19.41335527
Skewness-0.3036296928
Sum46313.47429
Variance828.641926
MonotonicityNot monotonic
2023-03-14T14:38:12.662574image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20.45555962
 
0.2%
24.253862072
 
0.2%
19.687510842
 
0.2%
57.682729242
 
0.2%
24.540382872
 
0.2%
24.968817552
 
0.2%
18.931469412
 
0.2%
66.504154742
 
0.2%
23.755602412
 
0.2%
23.221976482
 
0.2%
Other values (759)793
97.5%
ValueCountFrequency (%)
14.258039811
0.1%
14.273279881
0.1%
14.28041911
0.1%
14.323138111
0.1%
14.338474061
0.1%
14.424575251
0.1%
14.440088711
0.1%
14.442283031
0.1%
14.623138111
0.1%
14.697653081
0.1%
ValueCountFrequency (%)
94.920481121
0.1%
94.896134431
0.1%
94.762853851
0.1%
94.737635141
0.1%
94.712033061
0.1%
94.676957471
0.1%
94.589006011
0.1%
94.580758451
0.1%
94.57645811
0.1%
94.541282921
0.1%

ph
Real number (ℝ≥0)

HIGH CORRELATION

Distinct758
Distinct (%)93.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.308703046
Minimum4.548202098
Maximum8.967057762
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:12.817460image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum4.548202098
5-th percentile5.279523912
Q15.732453638
median6.112305667
Q36.655918078
95-th percentile7.98156501
Maximum8.967057762
Range4.418855664
Interquartile range (IQR)0.92346444

Descriptive statistics

Standard deviation0.8365354135
Coefficient of variation (CV)0.1326002203
Kurtosis0.6629709333
Mean6.308703046
Median Absolute Deviation (MAD)0.422239979
Skewness0.9210569124
Sum5128.975576
Variance0.6997914981
MonotonicityNot monotonic
2023-03-14T14:38:12.972421image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.1528111722
 
0.2%
7.2289634522
 
0.2%
7.6909623382
 
0.2%
5.5092953792
 
0.2%
6.4817830432
 
0.2%
5.9454659492
 
0.2%
6.5154995492
 
0.2%
6.3911735892
 
0.2%
5.9889927962
 
0.2%
5.5029991192
 
0.2%
Other values (748)793
97.5%
ValueCountFrequency (%)
4.5482020981
0.1%
4.5674464991
0.1%
4.6035631161
0.1%
4.6086952471
0.1%
4.6724370541
0.1%
4.6749415491
0.1%
4.6815760431
0.1%
4.6840792491
0.1%
4.6965186781
0.1%
4.6977507041
0.1%
ValueCountFrequency (%)
8.9670577621
0.1%
8.9317565581
0.1%
8.8687414431
0.1%
8.8614796681
0.1%
8.8292733281
0.1%
8.7661286541
0.1%
8.7537953342
0.2%
8.7363379051
0.1%
8.7199608931
0.1%
8.7181928472
0.2%

rainfall
Real number (ℝ≥0)

HIGH CORRELATION

Distinct788
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean111.584416
Minimum5.31450727
Maximum298.5601175
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.5 KiB
2023-03-14T14:38:13.115905image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum5.31450727
5-th percentile60.501239
Q173.0926704
median96.65888933
Q3125.0972687
95-th percentile245.3557502
Maximum298.5601175
Range293.2456102
Interquartile range (IQR)52.0045983

Descriptive statistics

Standard deviation59.42186943
Coefficient of variation (CV)0.5325283901
Kurtosis1.174180182
Mean111.584416
Median Absolute Deviation (MAD)25.33935863
Skewness1.119103291
Sum90718.13024
Variance3530.958567
MonotonicityNot monotonic
2023-03-14T14:38:13.250652image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
95.170281292
 
0.2%
125.09726872
 
0.2%
75.453280392
 
0.2%
122.38860152
 
0.2%
62.212921862
 
0.2%
107.69079642
 
0.2%
103.29264072
 
0.2%
113.3340262
 
0.2%
95.842534382
 
0.2%
105.41205142
 
0.2%
Other values (778)793
97.5%
ValueCountFrequency (%)
5.314507271
0.1%
5.3701756671
0.1%
5.4086817861
0.1%
5.6861677881
0.1%
5.8613986421
0.1%
6.000805681
0.1%
6.0186271781
0.1%
6.1247091171
0.1%
6.153932081
0.1%
6.2506605561
0.1%
ValueCountFrequency (%)
298.56011751
0.1%
298.40184711
0.1%
295.92487961
0.1%
295.60944921
0.1%
291.29866181
0.1%
290.67937831
0.1%
287.57669351
0.1%
286.50837251
0.1%
285.24936451
0.1%
284.43645671
0.1%

label
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
rice
139 
Soyabeans
130 
beans
125 
maize
119 
peas
100 
Other values (2)
200 

Length

Max length9
Median length6
Mean length5.468634686
Min length4

Characters and Unicode

Total characters4446
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowrice
2nd rowrice
3rd rowrice
4th rowrice
5th rowrice

Common Values

ValueCountFrequency (%)
rice139
17.1%
Soyabeans130
16.0%
beans125
15.4%
maize119
14.6%
peas100
12.3%
apple100
12.3%
grapes100
12.3%

Length

2023-03-14T14:38:13.386911image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-14T14:38:13.522944image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
rice139
17.1%
soyabeans130
16.0%
beans125
15.4%
maize119
14.6%
peas100
12.3%
apple100
12.3%
grapes100
12.3%

Most occurring characters

ValueCountFrequency (%)
e813
18.3%
a804
18.1%
s455
10.2%
p400
9.0%
i258
 
5.8%
b255
 
5.7%
n255
 
5.7%
r239
 
5.4%
c139
 
3.1%
S130
 
2.9%
Other values (6)698
15.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4316
97.1%
Uppercase Letter130
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e813
18.8%
a804
18.6%
s455
10.5%
p400
9.3%
i258
 
6.0%
b255
 
5.9%
n255
 
5.9%
r239
 
5.5%
c139
 
3.2%
o130
 
3.0%
Other values (5)568
13.2%
Uppercase Letter
ValueCountFrequency (%)
S130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4446
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e813
18.3%
a804
18.1%
s455
10.2%
p400
9.0%
i258
 
5.8%
b255
 
5.7%
n255
 
5.7%
r239
 
5.4%
c139
 
3.1%
S130
 
2.9%
Other values (6)698
15.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e813
18.3%
a804
18.1%
s455
10.2%
p400
9.0%
i258
 
5.8%
b255
 
5.7%
n255
 
5.7%
r239
 
5.4%
c139
 
3.1%
S130
 
2.9%
Other values (6)698
15.7%

Interactions

2023-03-14T14:38:08.713237image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:57.573964image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:58.906124image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:00.299864image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:02.626235image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:04.387106image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:05.800174image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:07.352794image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:08.919884image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:57.751664image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:59.067506image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:00.481466image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:02.797833image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:04.531851image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:05.990306image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:07.510722image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:09.114176image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:57.909840image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:59.224079image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:00.659494image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:03.000676image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:04.687948image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:06.165487image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:07.683628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:09.290738image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:58.057666image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:59.379657image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:00.815012image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:03.183397image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:04.860094image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:06.323792image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:07.826429image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:09.448403image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:58.231102image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:59.538631image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:00.999915image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:03.431565image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:05.034962image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:06.562414image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:07.987473image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:09.620840image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:58.391930image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:59.759282image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:01.149087image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:03.690629image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:05.210176image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:06.756673image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:08.167340image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:09.809226image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:58.592438image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:59.915796image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:01.338493image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:03.931640image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:05.403906image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:06.957208image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:08.334062image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:09.951918image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:37:58.732641image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:00.089899image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:02.452088image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:04.166517image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:05.608465image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:07.148757image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-14T14:38:08.539274image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2023-03-14T14:38:13.652133image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2023-03-14T14:38:13.803416image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-14T14:38:13.978545image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-14T14:38:14.135496image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-14T14:38:14.305733image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-14T14:38:10.250122image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-14T14:38:10.533733image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexNPKtemperaturehumidityphrainfalllabel
0090424320.87974482.0027446.502985202.935536rice
1185584121.77046280.3196447.038096226.655537rice
2260554423.00445982.3207637.840207263.964248rice
3374354026.49109680.1583636.980401242.864034rice
4478424220.13017581.6048737.628473262.717340rice
5569374223.05804983.3701187.073454251.055000rice
6669553822.70883882.6394145.700806271.324860rice
7794534020.27774482.8940865.718627241.974195rice
8889543824.51588183.5352166.685346230.446236rice
9968583823.22397483.0332276.336254221.209196rice

Last rows

df_indexNPKtemperaturehumidityphrainfalllabel
80316310651523.46168323.2219765.64543695.842534beans
804163213722124.32116621.0278675.82119460.275525beans
805163334602320.12574124.9696995.659255100.049718beans
80616349801921.80619618.5708665.945466125.097269beans
807163511722019.52226224.9260725.951177113.334026beans
80816363672417.00067619.9079055.520880103.292641beans
809163735692316.78791524.9688185.57841075.453280beans
81016383772524.84906222.8946465.60816562.212922beans
811163923621916.51783520.4555605.60943598.777942beans
812164022711718.15300219.3860215.509295107.690796beans